Learning to soar: Resource-constrained exploration in reinforcement learning
نویسندگان
چکیده
This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcement learning framework. The Gaussian process also provides a variance on these estimates that is used to measure the contribution of future observations to the Gaussian process value function model in terms of information gain. To avoid myopic exploration we developed a resource-weighted objective function that combines an estimate of the future information gain using an action rollout with the estimated value function to generate directed explorative action sequences. A number of modifications and computational speed-ups to the algorithm are presented along with a standard GP-SARSA(l) implementation with e-greedy exploration to compare the respective learning performances. The results show that under this objective function, the learning agent is able to continue exploring for better state-action trajectories when platform energy is high and follow conservative energy-gaining trajectories when platform energy is low.
منابع مشابه
Resource Constrained Exploration in Reinforcement Learning
This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...
متن کاملInvestigating the Soar-RL Implementation of the MAXQ Method for Hierarchical Reinforcement Learning
Discussed in greater detail below, Soar-RL is the integration of the reinforcement learning method of machine learning into Soar, a generalized architecture. The MAXQ method for hierarchical reinforcement learning [1] greatly influenced the design for the hierarchical reinforcement learning components of Soar-RL [2]. In its pre-release form, it is prudent to question the merits of this union: w...
متن کاملStrategies for Affect-Controlled Action-Selection in Soar-RL
Reinforcement learning (RL) agents can benefit from adaptive exploration/exploitation behavior, especially in dynamic environments. We focus on regulating this exploration/exploitation behavior by controlling the action-selection mechanism of RL. Inspired by psychological studies which show that affect influences human decision making, we use artificial affect to influence an agent’s action-sel...
متن کاملCost-Sensitive Exploration in Bayesian Reinforcement Learning
In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in ...
متن کاملSoar-RL: integrating reinforcement learning with Soar
In this paper, we describe an architectural modification to Soar that gives a Soar agent the opportunity to learn statistical information about the past success of its actions and utilize this information when selecting an operator. This mechanism serves the same purpose as production utilities in ACT-R, but the implementation is more directly tied to the standard definition of the reinforcemen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- I. J. Robotics Res.
دوره 34 شماره
صفحات -
تاریخ انتشار 2015